Compaction and tensile forces determine the accuracy of folding landscape 
parameters from single molecule pulling experiments 
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We establish a framework for assessing whether the transition state location of a biopolymer, which 
can be inferred from single molecule pulling experiments, corresponds to the ensemble of structures 
that have equal probability of reaching either the folded or unfolded states (Pfoid = 0.5). Using 
results for the forced-unfolding of a RNA hairpin, an exactly soluble model and an analytic theory, 
we show that Pfoid is solely determined by s, an experimentally measurable molecular tensegrity 
parameter, which is a ratio of the tensile force and a compaction force that stabilizes the folded state. 
Applications to folding landscapes of DNA hairpins and leucine zipper with two barriers provide 
a structural interpretation of single molecule experimental data. Our theory can be used to assess 
whether molecular extension is a good reaction coordinate using measured free energy profiles. 
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The response of biopolymers to mechanical force (/), 
at the single molecule level, has produced direct estimates 
of many features of their folding landscapes, which in 
turn has given a deeper understanding of how proteins 
and RNA fold. In particular, single molecule pulling ex- 
periments directly measure distribution of forces needed 
to rupture biomolecules, roughness and shapes of fold- 
ing landscapes [TH3]. Such measurements have made it 
possible to decipher the molecular origin of elasticity and 
mechanical stability of the building blocks of life, which is 
the first step in describing how they interact to function 
in the cellular context. The major challenge is to pro- 
vide a firm theoretical basis for interpreting the physical 
meaning and reliability of the folding landscape parame- 
ters that are extracted from trajectories that project dy- 
namics in multi dimensional space onto one dimensional 
molecular extension, which is conjugate to /. 

The key characteristics of the folding landscape 
of biomolecules that can be extracted from single 
molecule force spectroscopy (SMFS) measurements are 
/-dependent position of the transition state (TS), the 
distance (Ax* = xts — #o) from the ensemble of confor- 
mations that define the basin of attraction corresponding 
to the native states (NBA), and the free energy barrier 
(AF*). The assumption in the analysis of SMFS data 
is that molecular extension is a good reaction coordinate 
for RNA and proteins, which implies that a single de- 
gree of freedom accurately describes the behavior of the 
multiple degrees of freedom explored by the biomolecule. 
Structural meaning of Ax*, a parameter that is unique 
to SMFS, has never been made clear. Despite many 
subtleties in determining Ax* from measurements [5]- 
[7] ? xts is most easily identified as a local maximum of 
the free energy profile at the transition mid-force / m , 



F(R\f m ) 1 which can be constructed by measuring the 
statistics of end-to-end distance, P(R) at / = f m [I]- 
HJ [HI [9] . This method has been experimentally used to 
obtain sequence dependent folding landscapes of DNA 
hairpins [2], and more recently proteins [3 . In order to 
render physical meaning to Ax* we address two questions 
here: (1) Does Ax* describe the structures in the Transi- 
tion State Ensemble (TSE)? The TSE describes a subset 
of structures that have equal probability of reaching the 
NBA or UBA staring from Ax*. (2) Can a molecular 
tensegrity (short for tensional integrity) parameter [10] 
s, expressing balance between the internal compaction 
force fm = AFjjf/Axuf and the applied tensile force 
fc = AF*(f rn )/Ax*(f m )(s = fc/fm), describe the ade- 
quacy of Ax*(f m ) in describing the TSE structures? 

We use simulations of a RNA hairpin and an exactly 
soluble model, both of which are apparent two-state fold- 
ers as indicated by F(R\f), to answer the two ques- 
tions posed above. The TS is a surface in the multidi- 
mensional folding landscape (stochastic separatrix [TT] ) 
across which the flux to the NBA and the Unfolded 
Basin of Attraction (UBA) is identical. This implies 
that the fraction of folding trajectories corresponding to 
Ax* that start from the TS should have equal proba- 
bility (P fo i d ^0.5 [12]) of reaching the NBA and UBA 
[I2j [13]. At / = / m , the mean dwell times in NBA and 
the basin of attraction corresponding to unfolded con- 
formations (UBA) are identical, so that J^ TS dRP(R) = 
dRP(R) = 0.5. However, it is unclear whether or 
not the barrier top position is consistent with the require- 
ment Pfoid ~ 0.5 in force experiments. 

To ascertain, whether the barrier top position is con- 
sistent with the requirement Pfoid ~ 0-5 in force ex- 
periments, we study folding of P5GA, a RNA hairpin, 
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FIG. 1. Relaxation dynamics of the ensemble of P5GA hair- 
pins from the barrier top of F(R\f m )- (a) R(t) at / = f m - 
TSE between R = (4.1 — 4.2) nm is shown in the green box 
with a uniform R-distribution. (b) Starting from the TSE of 
hairpins from (a), 35 % of the trajectories (red) reach UBA 
and 65 % of the trajectories (blue) reach NBA. (c) Free energy 
profile at f = fm- (d) TSE of P5GA has a broad distribution 
in the number of base pairs (Nb p ). 



for which the NBA and UBA are equally populated at 
f m = 14.7 pN [5J [9]. Both free energy profiles and 
the kinetics predicted by Kramers' theory show excel- 
lent agreement with the simulation results [9 j, and for- 
mally establishes that extension R is a good reaction co- 
ordinate for describing hopping kinetics at / « f m . In 
practice, experimental time traces that have a number 
of transitions as the one in Fig. 1(a) can be used to es- 
timate Pfoid- With absorbing boundary conditions im- 
posed at Rn — 2 nm and R\j = 7.5 nm (Fig. 1(c)), we 
directly count the number of molecules from 47 points be- 
longing to the TS region (R = (4.1 - 4.2) nm (Fig. 1(c)) 
that reaches R < Rn (folded) and R > Rjj (unfolded). 
Although the R-distribution of the 47 points is uniform 
(Fig. 1(a)) we obtained Pfoid ~ 0.74. To determine Pfoid 
using the ensemble method, we launched 100 trajectories 
from each of the 47 structures and monitored their evo- 
lution (Fig. 1(b)) using Brownian dynamics simulations 
[T4] with the multidimensional energy function for the 
hairpin [8]. We find that Pfoid = 0.65 (Fig. 1(c)), which 
is similar to the value obtained by analyzing the folding 
trajectory. An examination of the individual trajecto- 
ries reveal that many molecules, initially with a gradient 
toward the UBA, re-cross the transition barrier to reach 
the NBA (blue trajectories in Fig. 1(b)). Conversely, most 
of the molecules directly reach R = Rn if they initially 
fall into the NBA, showing few recrossing events. Al- 
though the precise percentage of molecules reaching UBA 
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FIG. 2. Relaxation dynamics of GRM chains from the barrier 
top of PF(R\fm). (a) PF(R\fm) for the GRM with interior 
interaction. Parameters are listed in the text. Red symbols 
are from the simulation result sampled using the chain with 
Hamiltonian of Eq^ Solid line is the theoretical fit. Arrow 
shows the position of barrier top. (b) The distribution of inte- 
rior distance at R = Rts- Black line is the theoretical predic- 
tion, (c) The initial distribution of R prior to the relaxation 
dynamics, which is very tight round R ~ Rts — 6 nm. (d) 
The probability of being folded as a function of R. Open cir- 
cles show the exact numerical solution for the sharp potential 
(Eq[TJ. Filled circles are from simulations. Solid black line 
shows the approximate solution using the smoothed potential 
(Eq§. 



or NBA depends on the particular value of the boundary 
(Rn and Rjj), our simulations emphasize the importance 
of the re- crossing dynamics, which is known to cause sig- 
nificant deviations from the transition state theory. 

Deviation from Pfoid ~ 0.50 suggests that the global 
coordinate R alone is not sufficient to rigorously describe 
the hopping kinetics of P5GA. At least one other auxil- 
iary coordinate is needed, and for structural reasons we 
take it to be the number, A^ p , of base pairs [2]. The broad 
asymmetric distribution of N^ p within the narrow TS re- 
gion (R = (4.1 - 4.2)) implied by F(R\f m ) (Fig.l(d)) 
shows that hopping kinetics at f m should be described by 
multi (at least two) dimensional folding landscape even 
though the /-dependent rates of hopping between the 
NBA and UBA can be reliably predicted using F(R\f rn ) 
[9, 

To further illustrate if ^-coordinate alone is sufficient 
to determine the TSE structures we consider an analyt- 
ically solvable Generalized Rouse Model (GRM) [9 that 
has a single bond in the interior of the chain whose pres- 
ence corresponds to the NBA. The GRM Hamiltonian 
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where V c [x] = k(x 2 — c 2 )/2 for x < c and V c [x] = for x > 
c, describes a simple Gaussian chain under tension with 
an additional cutoff harmonic interaction at the interior 
points si and $2- The distribution of R for the GRM, 
with As — \s2 — si|, is 

P(R- As) oc Rsinh (PfR)e- 3R2/2 ^ N - As)a2 x 
3Rx 
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To obtain Pf Q id we se t N = 22 for the number of bonds, 
a = 0.545 nm for their spacing, j3 = l/fe^T, k = 1.65 
nm -2 , c = 4 nm for the strength and cutoff distance 
of the interior bond interaction, respectively, and As = 
18. For this set of parameters, the transition mid- force 
(where P(x < c) = P(x > c)) is f m « 16.8 pN. The 
two minima and the position of barrier top of the free 
energy /3F(R\f m ) = —\og[P(R)] are easily determined 
as Rn ~ 3.91 nm, ii^y « 9.39 nm, and it/rs = 6.00 nm, 
respectively (Fig. 2 (a)). 

To calculate Pfoid, we prepared 5000 GRM chains with 
= Rts — 6 nm that corresponds to the maximum in 
F(R\f rn ) (Fig. 2a), and allowed the system to relax to 
either of the two basins of attraction, ending the simu- 
lation when the chain extension attains the value Rn or 
Ru- These simulations are performed using the Hamil- 
tonian in Eq|l] and not on the simple one-dimensional 
profile f3F(R) = -log[P(R)}. We find 60 % (40 %) of 
the initial chains from Rts reach R = Rn (R = Ru), 
which implies Pf Q id — 0-6. Similar to the P5GA, the 
GRM dynamics projected onto the ^-coordinate using 
both sets of parameters exhibit a number of recrossing 
events. 

To understand the relation between x (the structural 
coordinate which specifies the NBA and UBA in the 
GRM) and chain extension R (pulling coordinate that 
is conjugate to /) we approximate the sharp interaction 
in Eq{l]by a smoothed potential, 

/3V c [x] « pv s [x] = - log (VW* 2 -c 2 )/2 + ^ _ (3) 

by taking advantage of the clear separation between 
the UBA and NBA (Fig. 2a). Defining J\T k = 
(e-W* 2 - c2 )/ 2 <f(|R| -i?)) , with (• • • ) denoting an aver- 
age over the Gaussian backbone, we can compute approx- 
imately P R (x < c) = f°dx(6(\R\ - R)) w A4/(A4=o + 
A//c), with (• • • ) representing an average over the GRM 
potential in Eqj3] The probability of bond formation 
satisfies 
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where k = Na 2 j3k/3, a = As/TV, and A = 1 + <j(l - a)*. 
At R = 6 nm, Eq{4] gives P R (x < c) w 0.402 ^ 0.5. 
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FIG. 3. Punfoid as a function of s — AF^/f m Ax^ for 
a variety of GRM parameters (colored circles). Red corre- 
sponds to small /3k ~ (0.1 — l)nm -2 , purple to intermediate 
Pk - (1 - 50)nm- 2 , and blue to high /3k - (50 - 500)nm~ 2 . 
The point sizes indicate the magnitude of the midpoint force, 
with the smallest points at f m = 5pN and the largest at 
fm = 25pN. The large square indicates the results of the 
P5GA simulations with P un foid = 0.35. Gray line is the the- 
oretical prediction P U nfoid(s) = (l + s) _1 /2 for s ^> 1. Solid 
line in the inset shows the predicted P U nfoid(s) for s « 1 
(Eq.5) using the DNA hairpin free energy profiles (not decon- 
volved) in Dashed line is for leucine zipper, which has 
two barriers, using data from [3 . As explained in the Supple- 
mentary Information, numbers 1 and 2 are for NBA— )4 and 
I— )>UBA transitions, respectively. Because of different kjj val- 
ues (Eq.5) dashed and solid lines do not coincide. 



Thus, when the dynamics is initiated from the top of the 
apparent free energy barrier using the R variable, their 
internal coordinate (x) is populated primarily with un- 
folded conformations (Fig.2(b)). Despite the fact that 
the UBA is primarily populated at R = Rts, we find 
that Pfoid ~ 0-6, so that the NBA will be predomi- 
nantly populated as the trajectories progress. The mid- 
point of transition (Pr(x < c) = 0.5) occurs at R m id — 
[(Na 2 \/3Ka 2 )(/3kc 2 - 3 log A)] 1/2 = 5.91 nm, which de- 
viates slightly from Rts — 6.00 nm. The results from 
Vs [x] , which are in excellent agreement with the simula- 
tions as well as the numerical results using V c [x] (Fig. 2), 
also suggests that hopping kinetics in the GRM involves 
coupling between x and R. Thus, even in this simple 
system accurate location of the TSE should consider two 
dimensional free energy profiles (see [6] and [T6]). 

To answer the second question wes introduce a molec- 
ular tensegrity parameter, which is a ratio of tensile 
force and a force that determines the stability of the 
biopolymer. The limit of mechanical stability of the 
NBA is determined by the critical unbinding force f c = 
AF i (f rn )/Ax t (f rn ). At the midpoint force / = f m , the 
tensegrity parameter s = f c /fm determines whether the 
applied external tension is sufficient to overcome the sta- 
bility of the NBA. Models, which approximate the free 
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energy profiles using a cubic or cusp potential [7], could 
alter f c or f m from the form suggested above. However, 
because s involves the ratio of the two forces the pre- 
cise numerical factors are not relevant. Barrier crossings 
between the NBA over the TS are governed by the com- 
petition between f c and the applied force / = f m . For 
fc ^> /m? barrier recrossing from the NBA to the TS 
will be extremely rare, while if f c <C / m , barrier crossing 
events will be common. We would therefore expect as the 
ratio s = f c /fm increases, barrier recrossings from within 
the NBA to the TS will decrease, and the probability of 
reaching R = Rn to increase. Thus, Pfoid, the structural 
link to Ax$ should be determined by the experimentally 
measurable tensegrity parameter s. 

To confirm this expectation, we ran 20 different sets 
of GRM parameters (1000 runs each), with f m rang- 
ing between 5-25pN, /3 k ranging from 0.1 to 600 nm -2 , 
c between 0.26 and 6 nm, and As between N/2 and 
N. Fig. 3 shows that P un foid = 1 — Pfoid decreases 
monotonically with s = fc/fm- The dependence of 
Punfoid on s (Fig. 3) can be derived by considering dif- 
fusion in a ID version of the GRM potential in Eq.(l), 
j3U{x) — —fx + \k u x 2 + V c (x) with k u being the curva- 
ture in the bottom of the UBA. Assuming constant dif- 
fusivity, P unfold = p XN dx'e^')/ C N dx'e^') 
where the positions of the transition barrier, native, and 
unfolded wells are given by c, xn, and xjj respectively. 
For s ^> 1, we obtain P un foid = (1 + 5 ) _1 /2, with the 
entire dependence on the energy landscape encoded in s. 
The analytical result, plotted as a solid curve in Fig. 3, 
captures the overall trend of the GRM and P5GA simu- 
lations. 

For Punfoid can be approximated as: 
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where $ = 4f m /^/27rk B Tku and k v = d 2 U/dx 2 at 
x = xjj. This relation can be used to predict the de- 
gree to which extension is a good reaction coordinate for 
DNA hairpins. We calculated s from the measured en- 
ergy landscapes in [2], and obtained P un foid using Eq.(5). 
We took f m « 12.5 pN, and (3k v « 0.2 nm" 2 . The pre- 
dicted Pfoid inset in Fig. 3) varies between 0.54-0.61 for 
the DNA hairpin sequences A-D, close to the ideal value 
of 1/2, indicating that extension is a reasonable coordi- 
nate except possibly for sequence C, which has a T:T base 
pair (bp) mismatch 7 bp from the stem. The limitation 
of extension as a reaction coordinate for this sequence is 
consistent with the observation that folding of sequence 
C has an intermediate as indicated by the three minima 
inF(ii|/ m ) 0. 

To illustrate that the theory is applicable when there 
are multiple barriers [4], we analyze the data for leucine 



zipper which unfolds by populating an intermediate. In 
this case there are two tensegrity parameters, si (=0.05) 
and 82 (=0.15) (see Supplementary Information). The 
predicted P un foid values show that extension is a good 
reaction coordinate for the NBA^I transition but is less 
for the I^UBA transition. (Fig. 3). 

Our results can be confirmed by analyzing long time 
folding trajectories R(t) as long as multiple hopping 
events occur. Because Ax^ is /-dependent, it follows that 
tensegrity can be altered by changing /. Thus, extension 
may be a good reaction coordinate over a certain range 
of / but may not remain so under all loading conditions. 
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Molecular Tensegrity parameters and 
Energy Landscapes with two barriers 
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Here we provide details of how the theory in the main 
text can be extended to systems with multiple barriers 
using GCN4 as an example. The equilibrium time se- 
ries for the GCN4 leucine zipper system was measured 
in a constant extension dual optical trap setup [1 j. We 
can express the bead-bead separation along the stretch- 
ing direction as xq +#(£), where xq is the average sep- 
aration, and x(t) the instantaneous deviation from the 
mean. Constructing histograms from the x(t) trajectory 
yields the probability distribution V exp (x). In order to 
calculate the tensegrity parameters, we need measured 
free energy profiles in the constant force (CF) ensemble, 
which is obtained from, 



V^ F (X) = C exp(/3fe t rapX 2 /4)^exp(^), 



(6) 



where fctrap = 0.25 pN/nm is the optical trap stiffness 
PP, f3 = 1/&bT, and Co is a normalization constant. 
The resulting distribution Vq ¥ (x) is in the constant force 
ensemble at a tension equal to /o, the average force in 
the experimental trajectory. To obtain the distribution 
Vcy(x) at / = f + 6f, we use 



P C F(x) = Cexp(PxSf)P^ F (x), 



(7) 



where C is another normalization constant. The associ- 
ated free energy f3F(x) = — \uVcf(x). 

Fig. |] shows f3F(x) at 13.8 pN. The three states in 
the free energy landscape are NBA, an intermediate I, 
and UBA. At / = 13.8 pN, we see well-defined wells 
corresponding to each of these states, with minima at xat, 
x/, and xjj respectively. At this force the probabilities 
of being in the NBA and I coincide *see below). The 
curvature k a = (d 2 F / dx 2 ) x=Xa of these wells, a — NBA, 
I, UBA, can be calculated numerically; the values are 
very similar for all three wells, and so for simplicity we 
take k a « k = 0.10 k B T/nm 2 (for both / = 11.5 and 
13.8 pN). The transition barrier between NBA and I is 
at position c\, and between I and UBA at position 

The first step in obtaining the tensegrity for each tran- 
sition is to determine the transition mid- force f m . We 
estimate the probability weight P a associated with each 
well, assuming that the main contribution comes from 
the region around the minimum. In this case, 



exp(pF(x a )). 



(8) 



The mid-force condition for NBA to I yields a force f m = 
13.8 pN where Pnba = Pi- Similarly the condition for I 
to UBA yields the force f m = 11.5 pN where Pj = Ptjba- 
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FIG. 4. Constant force free energy profile obtained using 
experimental data for leucine zipper (Fig. 3B in [T]) and Eqs. 
(6) and (7) at / = 13.8 pN, the midpoint of transition between 
NBA and I. 



The tensegrity parameters then follow from the shape 
of the free energy landscape at each of these f m : 

Case I: NBA to I, / = f m = 13.8 pN 



AF* = F(d) - F(x N ) = 0.90 k B T 
Ax* = c\ — xn = 5.4 nm 
s = AFV(ZmA^) = 0.05 

Case II: I to UBA, f = f m = H-5 pN 



(9) 



AF* = F(c 2 ) - F( Xl ) = 4.1 k B T 
Ax* = c<2 — xj = 9.8 nm 

^ = 0.15 



c 2 - xi 
AF* j (fmAx + 



(10) 



Since both s values are small, we use the P U nfoid expres- 
sion for small s to get the corresponding probabilities: 
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(11) 



where $ = 4f3f m /\/27rf3k. We plot the results in the 
inset of Fig. 3 in the main text. The dashed curve cor- 
responds to f m = 12.6 pN (the average of 11.5 and 13.8 
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pN), since the difference between the two f m does not 
make a noticeable difference in the predicted P U nfoid on 
the scale of the figure. 
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